---
title: Use voice mode
slug: /concepts-voice-mode
---

import Icon from "@site/src/components/icon";

:::info
Voice mode is not available in Langflow Desktop.
To use voice mode, [Install the Langflow OSS Python package](/get-started-installation#install-and-run-the-langflow-oss-python-package).
:::

You can use Langflow's voice mode to interact with your flows verbally through a microphone and speakers.

## Prerequisites

Voice mode requires the following:

* A flow with **Chat Input**, **Language Model**, and **Chat Output** components.

    If your flow has an **Agent** component, make sure the tools in your flow have accurate names and descriptions to help the agent choose which tools to use.

    Additionally, be aware that voice mode overrides typed instructions in the **Agent** component's **Agent Instructions** field.

* An [OpenAI](https://platform.openai.com/) account and an OpenAI API key because Langflow uses the OpenAI API to process voice input and generate responses.

* Optional: An [ElevenLabs](https://elevenlabs.io) API key to enable more voice options for the LLM's response.

* A microphone and speakers.

    A high quality microphone and minimal background noise are recommended for optimal voice comprehension.

## Test voice mode in the Playground

In the **Playground**, click the <Icon name="Mic" aria-hidden="true"/> **Microphone** to enable voice mode and verbally interact with your flows through a microphone and speakers.

The following steps use the **Simple Agent** template to demonstrate how to enable voice mode:

1. Create a flow based on the **Simple Agent** template.

2. Add your **OpenAI API key** credentials to the **Agent** component.

3. Click **Playground**.

4. Click the <Icon name="Mic" aria-hidden="true"/> **Microphone** icon to open the **Voice mode** dialog.

5. Enter your OpenAI API key, and then click **Save**. Langflow saves the key as a [global variable](/configuration-global-variables).

6. If you are prompted to grant microphone access, you must allow microphone access to use voice mode.
If microphone access is blocked, you won't be able to provide verbal input.

7. For **Audio Input**, select the input device to use with voice mode.

8. Optional: Add an ElevenLabs API key to enable more voices for the LLM's response.
Langflow saves this key as a global variable.

9. For **Preferred Language**, select the language you want to use for your conversations with the LLM.
This option changes both the expected input language and the response language.

10. Speak into your microphone to start the chat.

    If configured correctly, the waveform registers your input, and then the agent's logic and response are described verbally and in the **Playground**.

## Develop applications with websockets endpoints

Langflow exposes two OpenAI Realtime API-compatible websocket endpoints for your flows.
You can build applications against these endpoints the same way you would build against [OpenAI Realtime API websockets](https://platform.openai.com/docs/guides/realtime#connect-with-websockets).

The Langflow API's websockets endpoints require an [OpenAI API key](https://platform.openai.com/docs/overview) for authentication, and they support an optional [ElevenLabs](https://elevenlabs.io) integration with an ElevenLabs API key.

Additionally, both endpoints require that you provide the flow ID in the endpoint path.

### Voice-to-voice audio streaming

The `/ws/flow_as_tool/$FLOW_ID` endpoint establishes a connection to OpenAI Realtime voice, and then invokes the specified flow as a tool according to the [OpenAI Realtime model](https://platform.openai.com/docs/guides/realtime-conversations#handling-audio-with-websockets).

This approach is ideal for low latency applications, but it is less deterministic because the OpenAI voice-to-voice model determines when to call your flow.

### Speech-to-text audio transcription

The `/ws/flow_tts/$FLOW_ID` endpoint converts audio to text using [OpenAI Realtime voice transcription](https://platform.openai.com/docs/guides/realtime-transcription), and then directly invokes the specified flow for each transcript.

This approach is more deterministic but has higher latency.

This is the mode used in the Langflow **Playground**.

### Session IDs for websockets endpoints

Both endpoints accept an optional `/$SESSION_ID` path parameter to provide a unique ID for the conversation.
If omitted, Langflow uses the flow ID as the [session ID](/session-id).

However, be aware that voice mode only maintains context within the current conversation instance.
When you close the **Playground** or end a chat, verbal chat history is discarded and not available for future chat sessions.

## See also

* [Test flows in the Playground](/concepts-playground)